Instacart Market Basket Analysis

Instacart, a Famous grocery delivery platform, has made its "Instacart Online Grocery Shopping Dataset 2017" available to the public, containing over 3 million orders from 200,000 users. Within each user profile lie 4 to 100 orders, accompanied by comprehensive details on product sequences, order timing, and intervals between orders.

Objective:

The objective is to anticipate the products that will feature in a user's forthcoming order.

Before we start exploring the data in detail, let's gain a better understanding of the files provided. To achieve this, we'll begin by loading all the files into DataFrame objects and then examine the initial rows of each file.

Memory Reduction Function (reduce_memory_usage): The core of the script is the reduce_memory_usage function, which takes a DataFrame as input and iterates through its columns to reduce memory usage.

Reading The Data

Here's a breakdown of each table and its columns:

Order Table:

order_id: Unique identifier for each order.

user_id: Unique identifier for each user.

eval_set: Indicates which evaluation set this order belongs to (train, test, or prior).

order_number: The sequence number for this order (1 for the first order, 2 for the second, etc.).

order_dow: The day of the week the order was placed (0 for Sunday, 1 for Monday, etc.).

order_hour_of_day: The hour of the day the order was placed.

days_since_prior_order: Number of days since the previous order.

Products Table:

product_id: Unique identifier for each product.

product_name: Name of the product.

aisle_id: Identifier for the aisle where the product is located.

department_id: Identifier for the department where the product belongs.

Order Products Train Table:

order_id: Identifier for the order.

product_id: Identifier for the product.

add_to_cart_order: The order in which the product was added to the cart.

reordered: Indicates if the product was reordered in this order (1 for reordered, 0 otherwise).

Order Products Prior Table:

order_id: Identifier for the order.

product_id: Identifier for the product.

add_to_cart_order: The order in which the product was added to the cart.

reordered: Indicates if the product was reordered in this order (1 for reordered, 0 otherwise).

Departments Table:

department_id: Identifier for the department.

department: Name of the department.

Aisles Table:

aisle_id: Identifier for the aisle.

aisle: Name of the aisle.

Data Cleaning and Preprocessing:

First get the count Orders over the entire dataset

There are 3,214,874 orders in total. Out of which, the last purchase of 131,209 orders are given as train set and we need to predict for the rest 75,000 orders.

Clients over the entire dataset

Out of a total of 206,209 customers, 131,209 customers have their last purchase included in the training set. The task involves predicting for the remaining 75,000 customers whose last purchases are not included in the training set.

Distribution of Orders by Hour of Day </P>

Analyzing the distribution of orders by hour of the day provides valuable insights that can help businesses optimize their operations, improve customer experience, and drive growth and profitability.

Time of orders

Typically, orders are predominantly placed during daytime hours, typically between 8:00 AM and 5:00 PM.

Number of orders made for each day of the week. ( Days of Orders in a week)

Number of Orders by Day of Week and Hour of Day

Saturday evenings and Sunday mornings appear to be prime times for orders based on the combination of day of the week and hour of the day

Merging the Prior and Train order_products

Most ordered Products (Analyzing products )

Bestsellers Let’s have a look which products are sold most often (top10). And the clear winner is: Bananas

Most Reordered Products

Top 10 Products Added First to Basket: Analyzing products

The data reveals that certain products, such as Bananas, Bag of Organic Bananas, and Organic Whole Milk, are consistently among the top items added first to the basket. This suggests that these products hold significant importance to customers and are likely considered essential or preferred choices.

Most of them are organic products.! Also majority of them are fruits.

Analyzing the top products with the highest ratio of reordered purchases.

Products like "Raw Veggie Wrappers," "Serenity Ultimate Extrema Overnight Pads," and "Orange Energy Shots" exhibit very high reordered ratios (ranging from 90% to 94%). This suggests a high level of customer loyalty, as a significant proportion of customers who purchase these items tend to repurchase them.

Defining and analyzing the comparison between purchases and reordered purchases.

The data provides valuable insights into product popularity, consumer behavior regarding reordering, and trends that can inform strategic decision-making for businesses in terms of marketing, inventory management, and product development.

• Top Purchased Items: "Banana," "Bag of Organic Bananas," and "Organic Strawberries" are among the most frequently purchased items, indicating their popularity among customers.

• Organic Preference: The presence of organic items indicates a preference for organic products among customers.

Healthy Choices: Products such as "Organic Whole Milk," "Organic Baby Carrots," and "Organic Cucumber" reflect a trend towards healthier food choices.

Number of Items Bought

Apparently people usually order around 4 items. The distributions of item numbers are similar between train and pre-order sets.

Minimum number of items bought per user: 4

Maximum number of items bought per user: 100

studying the count of days since the last order

Studying the count of days since the last order is important for businesses to better understand customer behavior, enhance customer engagement and retention, and optimize their operations and marketing strategies accordingly.

we observe that the highest frequency of orders occurs when the days since the prior order are low, particularly within the first few days. Specifically, the frequency peaks when the days since the prior order range from 1 to 7, indicating that a significant portion of customers tends to reorder within a week's time. Conversely, the lowest frequency of orders is evident when the days since the prior order extend beyond two weeks, with a notable decline observed after the 14-day mark.and reurn increae to peak after 30 days.

we observe that the highest frequency of orders occurs when the days since the prior order are low, particularly within the first few days. Specifically, the frequency peaks when the days since the prior order range from 1 to 7, indicating that a significant portion of customers tends to reorder within a week's time. Conversely, the lowest frequency of orders is evident when the days since the prior order extend beyond two weeks, with a notable decline observed after the 14-day mark.and reurn increae to peak after 30 days.

monitoring the effect of reordered ratio on product position in the cart

It appears that products added to the cart initially have a higher probability of being reordered compared to those added later. This observation aligns with the common behavior of first adding frequently purchased items to the cart and then exploring new products. This trend suggests that customers prioritize replenishing familiar items before considering new ones during their shopping sessions.

analyzing the distribution of products across departments. Analyzing departments

Analyzing the distribution of purchases across departments provides valuable insights into customer behavior.

It appears that while the "Produce" department comprises only a small fraction of the total number of products available (2%), it accounts for a significant portion (29%) of the total purchases made.

Produce is the largest department. Now let us check the reordered percentage of each department.

Department wise reorder ratio:

Customers who purchase items from the Dairy & Eggs and Produce departments are likely to exhibit higher reordering tendencies compared to those who buy from the Personal Care category.

Now let us look at the important aisles. Analyzing sub-departments

The top two aisles are fresh fruits and fresh vegetables.

Calculating the ratio of the reordered for each Aisle

Aisle Performance: By calculating the ratio of the 'reordered' column for each aisle, that help to understand the average likelihood of products in each aisle being reordered.

Aisles like "Milk," "Water Seltzer Sparkling Water," "Fresh Fruits," and "Eggs" have relatively high reorder ratios, the customers frequently repurchase items from these sections. These products are essentials or frequently consumed items in households.

Analyzing organic products

Reordering Organic vs Non-Organic People more often reorder organic products vs non-organic products.

Association between days since last order and the ratio of reorders

74 % of products bought at the same day of prev order, are reorders

69% of products bought after 1 week of prev order, are reorders

Conculsion:

if future order will be at the same day of prev order, percentage of reorders in the future product is high.

if future order will be after a week from the prev order, percentage of reorders in the future product is high.

Relationship between Add to Cart Order and Reorder Ratio

----------------------------------------------------------------------